Skip to content

fix: prune stale container pins and move UpdateContainerPins to after compilation#39770

Merged
pelikhan merged 3 commits into
mainfrom
copilot/actions-lock-json-firewall-version-update
Jun 17, 2026
Merged

fix: prune stale container pins and move UpdateContainerPins to after compilation#39770
pelikhan merged 3 commits into
mainfrom
copilot/actions-lock-json-firewall-version-update

Conversation

Copilot AI commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

gh aw upgrade/update re-pinned digests for whatever container versions were already in actions-lock.json rather than the versions compile actually emits. After compilation bumped the bundled AWF version (e.g. 0.27.00.27.2), actions-lock.json retained the old entries indefinitely — no command reconciled the two.

Root cause

UpdateContainerPins ran before compilation in both upgrade and update. By the time compile wrote new AWF versions into the .lock.yml files, the pin-update step had already finished against the old lock files.

Changes

  • pkg/workflow/action_cache.go — add PruneStaleContainerPins(knownImages map[string]bool) int: removes ContainerPins entries whose keys are absent from the current lock-file image set; returns the pruned count.

  • pkg/cli/update_container_pins.go — before resolving new digests, build a set from the collected lock-file images and call PruneStaleContainerPins to evict superseded entries (e.g. old AWF tags). Also save the cache when entries were only pruned, not just when new pins were added.

  • pkg/cli/upgrade_command.go — move UpdateContainerPins from before compilation (old Step 3b) to after compilation (new Step 4b), so it reads lock files that already carry the current AWF version.

  • pkg/cli/update_command.go — move UpdateContainerPins to after UpdateActionsInWorkflowFiles (which drives compilation), for the same reason.

After this change, the containers block in actions-lock.json always reflects exactly the image tags present in the compiled .lock.yml files — stale versions are pruned, new versions are pinned.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@github-actions github-actions Bot mentioned this pull request Jun 17, 2026
- Add PruneStaleContainerPins to ActionCache to remove entries no longer
  referenced by compiled lock files
- Update UpdateContainerPins to prune stale entries and save when only
  pruning occurred (not just when new pins were added)
- Move UpdateContainerPins to after compilation in upgrade and update
  commands so it reads fresh lock files with the current AWF version
- Add tests for PruneStaleContainerPins and the pruning flow

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix drift between actions-lock.json and compiled .lock.yml fix: prune stale container pins and move UpdateContainerPins to after compilation Jun 17, 2026
Copilot AI requested a review from pelikhan June 17, 2026 12:32
@pelikhan pelikhan marked this pull request as ready for review June 17, 2026 12:39
Copilot AI review requested due to automatic review settings June 17, 2026 12:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes container digest pin drift by pruning stale entries from actions-lock.json and reordering when pins are updated so the pin updater reads the post-compile .lock.yml image set.

Changes:

  • Added ActionCache.PruneStaleContainerPins to evict container pins not referenced by current compiled lock files.
  • Updated container pin update flow to prune stale pins and persist the cache even when only pruning occurred.
  • Moved UpdateContainerPins to run after compilation in both upgrade and update flows.
Show a summary per file
File Description
pkg/workflow/action_cache.go Adds stale container-pin pruning helper on the cache.
pkg/workflow/action_cache_container_pin_test.go Adds unit tests covering prune behavior (stale/all/none/nil).
pkg/cli/update_container_pins.go Prunes stale container pins based on lock-file image set and saves on prune-only updates.
pkg/cli/update_container_pins_test.go Adds a test exercising pruning behavior using a real cache file on disk.
pkg/cli/upgrade_command.go Moves container pin updates to after compilation in upgrade.
pkg/cli/update_command.go Moves container pin updates to after workflow update/compile in update.
.github/workflows/test-quality-sentinel.lock.yml Regenerated lock content (schema key change).
.github/workflows/smoke-copilot.lock.yml Regenerated lock content (schema key change).
.github/workflows/smoke-copilot-arm.lock.yml Regenerated lock content (schema key change).
.github/workflows/smoke-copilot-aoai-entra.lock.yml Regenerated lock content (schema key change).
.github/workflows/smoke-copilot-aoai-apikey.lock.yml Regenerated lock content (schema key change).
.github/workflows/smoke-claude.lock.yml Regenerated lock content (schema key change).
.github/workflows/security-review.lock.yml Regenerated lock content (schema key change).
.github/workflows/refiner.lock.yml Regenerated lock content (schema key change).
.github/workflows/pr-triage-agent.lock.yml Regenerated lock content (schema key change).
.github/workflows/pr-nitpick-reviewer.lock.yml Regenerated lock content (schema key change).
.github/workflows/pr-code-quality-reviewer.lock.yml Regenerated lock content (schema key change).
.github/workflows/mattpocock-skills-reviewer.lock.yml Regenerated lock content (schema key change).
.github/workflows/grumpy-reviewer.lock.yml Regenerated lock content (schema key change).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 19/19 changed files
  • Comments generated: 1

Comment on lines +91 to +96
// Build a set of images currently referenced in the compiled lock files so
// that stale entries (e.g. superseded AWF versions) can be pruned.
imageSet := make(map[string]bool, len(images))
for _, img := range images {
imageSet[img] = true
}
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer failed to deliver outputs during the skills-based review.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (213 new lines across pkg/cli/ and pkg/workflow/) but does not have a linked Architecture Decision Record (ADR).

⚠️ I generated a draft ADR but could not commit it to your branch automatically — the push step ran against a detached-HEAD git context. The full draft is included below; please add it as docs/adr/39770-reconcile-container-pins-after-compilation.md.

🔒 This PR cannot merge until an ADR is linked in the PR body.

📄 Draft ADR — copy into docs/adr/39770-reconcile-container-pins-after-compilation.md
# ADR-39770: Reconcile Container Pins After Compilation

**Date**: 2026-06-17
**Status**: Draft

## Context

`gh aw upgrade` and `gh aw update` maintain SHA-256 digest pins for container images in `.github/aw/actions-lock.json`. Previously `UpdateContainerPins` ran *before* compilation, so it resolved pins against the lock files that existed at the start of the run. When compilation subsequently bumped the bundled AWF version (e.g. `0.27.0` -> `0.27.2`) inside the generated `.lock.yml` files, no step reconciled `actions-lock.json` afterward. As a result the `containers` block retained superseded version entries indefinitely and never gained pins for the newly emitted versions, causing the lock file to drift from what `compile` actually produces.

## Decision

We will treat `actions-lock.json`'s `containers` block as a *derived projection* of the compiled `.lock.yml` files. To enforce this, we will (1) move `UpdateContainerPins` to run **after** compilation in both `upgrade` (new Step 4b) and `update`, so it reads lock files that already carry the current AWF version, and (2) add `PruneStaleContainerPins`, which removes any pin whose image key is absent from the current lock-file image set before resolving new digests. The cache is now saved when entries were only pruned, not solely when new pins were added.

## Alternatives Considered

### Alternative 1: Keep pinning before compilation and re-run it again afterward

Run `UpdateContainerPins` both before and after compile. Rejected because it doubles the (potentially slow, Docker-dependent) digest-resolution work and still requires pruning logic to remove stale entries.

### Alternative 2: Prune stale pins without reordering the step

Add `PruneStaleContainerPins` but leave `UpdateContainerPins` before compilation. Rejected because, run before compilation, the step still cannot see the newly emitted AWF version, so it would prune the old pin yet fail to add the new one.

## Consequences

### Positive
- `actions-lock.json` always reflects exactly the image tags present in the compiled lock files.
- Eliminates indefinite accumulation of superseded AWF version entries.

### Negative
- Digest resolution now happens later in the flow, so pins are not available during the compile step of the same run.
- Pruning is keyed on exact image strings; a change to how images are referenced could prune entries still logically in use.

### Neutral
- `UpdateContainerPins` remains non-fatal when Docker is unavailable; pruning still occurs and the cache is saved if anything changed.
- Behavior under `--no-compile` is unchanged in spirit: pins are pruned and refreshed against whatever lock files are currently on disk.
📋 What to do next
  1. Review the draft ADR above — it was generated from the PR diff.
  2. Complete the missing sections — add context the AI couldn't infer and refine the rationale.
  3. Commit the finalized ADR to docs/adr/39770-reconcile-container-pins-after-compilation.md on your branch.
  4. Reference the ADR in this PR body, e.g.:

    ADR: ADR-39770: Reconcile Container Pins After Compilation

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

❓ Why ADRs Matter

ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you.

📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

  • Context — What is the problem? What forces are at play?
  • Decision — What did you decide? Why?
  • Alternatives Considered — What else could have been done?
  • Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ numbered by PR number (e.g., 39770-...md).

🔒 This PR cannot merge until an ADR is linked in the PR body.

🏗️ ADR gate enforced by Design Decision Gate 🏗️ ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose, /tdd, and /grill-with-docs — requesting changes on one correctness gap and a test coverage issue.

📋 Key Themes & Highlights

Correctness

  • Early-exit skips prune (update_container_pins.go:73-77): when collectImagesFromLockFiles returns an empty slice, PruneStaleContainerPins is never called. Stale pins accumulate indefinitely if all container workflows are removed. The prune step should run with an empty imageSet before the early return.

Test Coverage

  • Integration test bypasses UpdateContainerPins: TestUpdateContainerPins_PrunesStaleEntries manually re-implements the cache-load + prune wiring from inside UpdateContainerPins rather than calling the function. Future internal refactors won't be caught.
  • Missing edge-case test: no test for "empty lock files, non-empty stale cache" — the exact scenario the early-exit bug affects.
  • No regression guard for step ordering: moving UpdateContainerPins to after compilation is the primary fix, but nothing would fail if the steps were accidentally re-ordered.

Documentation

  • Unexplained .lock.yml schema changes: 13 files change optionalPositiveIntegerissueOrPRNumber, which is unrelated to container pins and not mentioned in the PR description.

Positive Highlights

  • ✅ Root cause is clearly identified and symmetrically fixed in both upgrade and update command paths
  • PruneStaleContainerPins is clean: nil-safe, sets dirty flag correctly, logs at the right level
  • ✅ Save-trigger fix (len(updatedImages) > 0 || prunedCount > 0) is exactly right
  • TestPruneStaleContainerPins_NoneStale verifies dirty is not set when nothing changes — a subtle but important guard
  • ✅ Four unit tests cover the full edge-case matrix for the new function

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer

Comments that could not be inline-anchored

pkg/cli/update_container_pins.go:77

[/diagnose] The len(images) == 0 early return skips PruneStaleContainerPins entirely. If a user removes all container-based workflows, collectImagesFromLockFiles returns an empty slice, the function exits here, and stale pins in actions-lock.json are never cleaned up.

<details>
<summary>💡 Suggested fix</summary>

Load the cache and run prune before the early exit, passing an empty imageSet when no images are found:

// Load cache + always prune (empty set removes all sta…

</details>

<details><summary>pkg/cli/update_container_pins_test.go:220</summary>

**[/tdd]** This test manually re-implements the `collectImagesFromLockFiles → imageSet → PruneStaleContainerPins` wiring that lives *inside* `UpdateContainerPins` rather than calling the function itself. If that internal wiring changes, this test stays green even though the real code path is broken.

&lt;details&gt;
&lt;summary&gt;💡 Suggestion&lt;/summary&gt;

The Docker constraint is real, but consider extracting the prune-and-save logic into a helper that accepts a pre-loaded cache, making the real code path</details>

<details><summary>pkg/cli/upgrade_command.go:332</summary>

**[/diagnose]** The step ordering`UpdateContainerPins` running after compilationis the core behavioral fix in this PR, but there&#39;s no automated guard that enforces it. If Step 4b were accidentally moved back above Step 4, the bug would silently return with no failing test.

&lt;details&gt;
&lt;summary&gt;💡 Suggestion&lt;/summary&gt;

Even a lightweight comment noting the ordering constraint is load-bearing helps reviewers:

```go
// Step 4b: MUST run after Step 4 (compilation).
// Compilation rewrites .lo…

</details>

<details><summary>pkg/cli/update_container_pins_test.go:176</summary>

**[/tdd]** Missing edge case: &quot;no images in lock files, but stale pins in cache.&quot; Due to the early exit at line 73–77, `PruneStaleContainerPins` is never called in this scenario, so stale pins persist. A failing test here both documents the expected behaviour (pins should be cleared) and acts as a regression guard once the early-exit issue is fixed.

&lt;details&gt;
&lt;summary&gt;💡 Suggested test skeleton&lt;/summary&gt;

```go
func TestPruneStillRunsWhenNoLockFiles(t *testing.T) {
    // actions-lock.json has…

</details>

<details><summary>.github/workflows/grumpy-reviewer.lock.yml:745</summary>

**[/grill-with-docs]** These 13 `.lock.yml` changes (`optionalPositiveInteger``issueOrPRNumber` on `pull_request_number` parameters) appear to be a separate schema evolution that was bundled in because lock files were regenerated while testing this fix. They&#39;re not mentioned in the PR description and could confuse reviewers who are trying to understand the container pin scope.

If these are intentional (e.g., to support `#aw_` temporary ID references on `pull_request_number`), a single sente</details>

<details><summary>pkg/workflow/action_cache_container_pin_test.go:198</summary>

**[/tdd]**Setting `cache.dirty = false` before the call and then asserting `assert.False(t, cache.dirty, ...)` is a great touchit explicitly guards against spurious saves when nothing changes. Easy to miss, important to have.

</details>

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test quality analysis by Test Quality Sentinel ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 80/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%).

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

Copilot AI commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

@copilot run pr-finisher skill

Ran a full pr-finisher pass on this branch (HEAD 4a1a76a): local validation succeeded (make fmt, make lint, make test-unit, make test, make recompile), and prior CI failure logs were reviewed (safe-outputs 502 during review submission). No additional code changes were needed in this pass.

@pelikhan pelikhan merged commit 97be761 into main Jun 17, 2026
89 of 90 checks passed
@pelikhan pelikhan deleted the copilot/actions-lock-json-firewall-version-update branch June 17, 2026 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants